智能论文笔记

Probabilistic Random Indexing for Continuous Event Detection

Yashank Singh , Niladri Chatterjee

分类：机器学习 | 自然语言处理

2020-08-28

本文探讨了基于随机索引（RI）的新型变体，用于编码语言数据，以便在动态场景中使用它们以连续方式发生事件。随着Onehot编码的一般方法中的表示的大小随着词汇量的大小而导致的，它们对于具有高卷的动态数据的在线目的变得不可扩展。另一方面，由于文本数据的动态性质，现有的预先训练的嵌入模型不适合检测新事件的发生事件。本工作通过利用新的RI表示来解决这个问题，通过对一类RI表示的随机条目的数量施加概率分布来解决概率分布。它还规则分析了在正交性概率方面编码语义信息的代表方法的良好分析。构建这些想法我们提出了一种算法，该算法与词汇表的大小，以跟踪查询字的语义关系，以便建议与有问题的单词相关的事件。使用特定于三个不同事件的推文数据的所提出的算法，我们耗尽了模拟，并呈现了我们的研究结果。发现所提出的概率RI表示比单词（弓）嵌入的袋子更快，可伸缩，同时保持描绘语义关系的准确性。

translated by 谷歌翻译

AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

Aowabin Rahman , Arnab Bhattacharya , Thiagarajan Ramachandran , Sayak Mukherjee , Himanshu Sharma , Ted Fujimoto , Samrat Chatterjee

分类：机器人 | 机器学习

2022-12-20

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

translated by 谷歌翻译

Observability-aware online multi-lidar extrinsic calibration

Sandipan Das , Ludvig af Klinteberg , Maurice Fallon , Saikat Chatterjee

分类：机器人

2022-12-19

Accurate and robust extrinsic calibration is necessary for deploying autonomous systems which need multiple sensors for perception. In this paper, we present a robust system for real-time extrinsic calibration of multiple lidars in vehicle base frame without the need for any fiducial markers or features. We base our approach on matching absolute GNSS and estimated lidar poses in real-time. Comparing rotation components allows us to improve the robustness of the solution than traditional least-square approach comparing translation components only. Additionally, instead of comparing all corresponding poses, we select poses comprising maximum mutual information based on our novel observability criteria. This allows us to identify a subset of the poses helpful for real-time calibration. We also provide stopping criteria for ensuring calibration completion. To validate our approach extensive tests were carried out on data collected using Scania test vehicles (7 sequences for a total of ~ 6.5 Km). The results presented in this paper show that our approach is able to accurately determine the extrinsic calibration for various combinations of sensor setups.

translated by 谷歌翻译

Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural Networks

Mathias Lechner , Đorđe Žikelić , Krishnendu Chatterjee , Thomas A. Henzinger , Daniela Rus

分类：机器学习

2022-11-29

We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.

translated by 谷歌翻译

Beyond Counting Datasets: A Survey of Multilingual Dataset Construction and Necessary Resources

Xinyan Velocity Yu , Akari Asai , Trina Chatterjee , Junjie Hu , Eunsol Choi

分类：自然语言处理 | 人工智能

2022-11-28

While the NLP community is generally aware of resource disparities among languages, we lack research that quantifies the extent and types of such disparity. Prior surveys estimating the availability of resources based on the number of datasets can be misleading as dataset quality varies: many datasets are automatically induced or translated from English data. To provide a more comprehensive picture of language resources, we examine the characteristics of 156 publicly available NLP datasets. We manually annotate how they are created, including input text and label sources and tools used to build them, and what they study, tasks they address and motivations for their creation. After quantifying the qualitative NLP resource gap across languages, we discuss how to improve data collection in low-resource languages. We survey language-proficient NLP researchers and crowd workers per language, finding that their estimated availability correlates with dataset availability. Through crowdsourcing experiments, we identify strategies for collecting high-quality multilingual data on the Mechanical Turk platform. We conclude by making macro and micro-level suggestions to the NLP community and individual researchers for future multilingual data development.

translated by 谷歌翻译

A survey of some recent developments in measures of association

Sourav Chatterjee

分类：机器学习 | (统计)机器学习

2022-11-09

This paper surveys some recent developments in measures of association related to a new coefficient of correlation introduced by the author. A straightforward extension of this coefficient to standard Borel spaces (which includes all Polish spaces), overlooked in the literature so far, is proposed at the end of the survey.

translated by 谷歌翻译

Centaur: Federated Learning for Constrained Edge Devices

Fan Mo , Mohammad Malekzadeh , Soumyajit Chatterjee , Fahim Kawsar , Akhil Mathur

分类：机器学习

2022-11-08

Federated learning (FL) on deep neural networks facilitates new applications at the edge, especially for wearable and Internet-of-Thing devices. Such devices capture a large and diverse amount of data, but they have memory, compute, power, and connectivity constraints which hinder their participation in FL. We propose Centaur, a multitier FL framework, enabling ultra-constrained devices to efficiently participate in FL on large neural nets. Centaur combines two major ideas: (i) a data selection scheme to choose a portion of samples that accelerates the learning, and (ii) a partition-based training algorithm that integrates both constrained and powerful devices owned by the same user. Evaluations, on four benchmark neural nets and three datasets, show that Centaur gains ~10% higher accuracy than local training on constrained devices with ~58% energy saving on average. Our experimental results also demonstrate the superior efficiency of Centaur when dealing with imbalanced data, client participation heterogeneity, and various network connection probabilities.

translated by 谷歌翻译

Learning Control Policies for Stochastic Systems with Reach-avoid Guarantees

Đorđe Žikelić , Mathias Lechner , Thomas A. Henzinger , Krishnendu Chatterjee

分类：机器学习 | 人工智能

2022-10-11

We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.

translated by 谷歌翻译

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

Niladri S. Chatterji , Philip M. Long

分类：机器学习 | 人工智能 | (统计)机器学习

2022-09-19

我们束缚了使用梯度流训练的深度线性网络的多余风险。在先前用于建立最小$ \ ell_2 $ -norm interpolant的风险范围的设置中，我们表明随机初始化的深线性网络可以紧密近似甚至匹配已知的范围，即最小$ \ ell_2 $ - norm interpolant。我们的分析还表明，插值深线性模型具有与最小$ \ ell_2 $ -Norm解决方案完全相同的条件差异。由于噪声仅通过条件差异影响多余的风险，因此这意味着深度并不能提高算法“隐藏噪声”的能力。我们的模拟验证了我们边界的各个方面反映了简单数据分布的典型行为。我们还发现，在具有Relu网络的模拟中也可以看到类似的现象，尽管情况更加细微。

translated by 谷歌翻译

Multi-segmented Adaptive Feet for Versatile Legged Locomotion in Natural Terrain

Abhishek Chatterjee , An Mo , Bernadett Kiss , Emre Cemal Gonen , Alexander Badri-Spröwitz

分类：机器人

2022-09-18

大多数腿部机器人都是由串行安装链路和执行器的腿部结构构建的，并通过复杂的控制器和传感器反馈来控制。相比之下，动物发展了多段腿，关节之间的机械耦合以及多段的脚。它们在所有地形上运行敏捷，可以说是通过更简单的运动控制。在这里，我们专注于开发抗原在自然地形上也滑落和下沉的脚步机制。我们提出了安装在具有多接头机械肌腱耦合的鸟类灵感机器人腿上的多段脚的首先结果。我们的单段和两段机械自适应的脚显示在开始滑动之前，在多个软和硬质基材上显示了可行的水平力。我们还观察到，与球形和圆柱 - 脚相比，分割的脚减少了软底物上的下沉。我们报告了多段脚如何提供非常适合双皮亚机器人的可行压力点的范围范围，还适用于斜坡和自然地形上的四倍机器人。我们的结果还提供了对诸如级别鸟类等动物的分段脚的功能理解。

translated by 谷歌翻译